BIOS 635: Bias and variance trade-off, Classification, K-nearest neighbor

Kevin Donovan

1/26/2021

Review

Supervised Learning

Supervised Learning

missing

Supervised Learning

missing

Mean Squared Error Decomposition

Recall: \(MSE\) for estimate at \(X=x\) can be decomposed into \[MSE_{\hat{f}}(x)=E[(Y-\hat{f}(X))^2|X=x]=[f(x)-\hat{f}(x)]^2+Var(\epsilon)\]

Consider taking expectation marginally (i.e., across \(Y\) and \(X\)).

Can show \[E[(Y-\hat{f}(X))^2]=E_x[\text{bias}(\hat{f}(x))^2]+E_x[\text{Var}(\hat{f}(x))]+\text{Var}(\epsilon)\]

where \(\text{bias}(\hat{f}(x))=\text{E}[\hat{f}(x)]-f(x)\)

Bias-variance trade-off

Above means bias and variance of model increases expected model error

Creates tradeoff:

Bias-variance trade-off

missing

Classification

Suppose instead response \(Y\) is categorical

e.g. cancer stage is one of \(C=(0, 1, 2, 3, 4)\) where \(0\) indicates cancer-free

Goals:

missing

Classification

What to model?:

Let \(p_k(x)=\text{Pr}(Y=k|X=x)\), \(k=1,2,\ldots,K\)

Denoted as the conditional class probabilities at \(x\)

If these are known, can define classifier at \(x\) by

\(f(x)=j\) if \(p_j(x)=\text{max}[p_1(x), \ldots, p_K(x)]\)

Denoted as the Bayes optimal classifier at \(x\)

missing

Classification metrics

Basic:

\(\text{accuracy}=\frac{\text{# correct predictions}}{\text{# test instances}}\)

\(\text{error}=1-\text{accuracy}\)

These are in general not sufficient (why?)

Confusion Matrix

missing

Confusion Matrix

Example

During the COVID-19 pandemic, different metrics to quantify risk:

K-nearest Neighbor (KNN)

Simple and flexible algorithm:
missing

Distance Metrics

Distance Metrics

missing

KNN Examples

  1. Regression
## k-Nearest Neighbors 
## 
## 30 samples
##  2 predictor
## 
## Pre-processing: centered (2), scaled (2) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 30, 30, 30, 30, 30, 30, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5   4.585041  0.8918501   3.700367
##    7   4.845494  0.8979390   3.972826
##    9   5.044867  0.8978793   4.163350
##   11   5.408245  0.9069323   4.400432
##   13   6.119488  0.8964707   4.995359
##   15   6.908745  0.8930579   5.724303
##   17   7.706909  0.8906881   6.415243
##   19   8.613891  0.8744222   7.197625
##   21   9.406709  0.8592141   7.902947
##   23  10.066420  0.8578698   8.499900
##   25  10.842491  0.7900220   9.237286
##   27  11.829461  0.7020191  10.111490
##   29  12.466729  0.6103163  10.753089
##   31  12.744056        NaN  11.034206
##   33  12.744056        NaN  11.034206
##   35  12.744056        NaN  11.034206
##   37  12.744056        NaN  11.034206
##   39  12.744056        NaN  11.034206
##   41  12.744056        NaN  11.034206
##   43  12.744056        NaN  11.034206
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

KNN Examples

  1. Classification
## k-Nearest Neighbors 
## 
## 30 samples
##  2 predictor
##  2 classes: 'no', 'yes' 
## 
## Pre-processing: centered (2), scaled (2) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 30, 30, 30, 30, 30, 30, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    5  0.8625986  0.7151352
##    7  0.8640695  0.7320667
##    9  0.8463797  0.6966267
##   11  0.8348877  0.6892596
##   13  0.8283119  0.6807602
##   15  0.8381627  0.7011166
##   17  0.7941796  0.6165274
##   19  0.7428680  0.5514878
##   21  0.6846963  0.4460085
##   23  0.6910537  0.4625074
##   25  0.5882703  0.2921750
##   27  0.4964599  0.1476145
##   29  0.4181164  0.0000000
##   31  0.4181164  0.0000000
##   33  0.4181164  0.0000000
##   35  0.4181164  0.0000000
##   37  0.4181164  0.0000000
##   39  0.4181164  0.0000000
##   41  0.4181164  0.0000000
##   43  0.4181164  0.0000000
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.

KNN Examples

missing

Train and Test Error

missing